<<<<<<< HEAD Spatial-Regression.knit

SISMID Spatial Satistics

Data Lab 4 Disease Mapping with Conditional Autoregressive Models


4.1 Introduction to Conditional Autoregressive Models

We will first load the dataframe with shapefile of chlamydia incidence and income from previous lab.

library (sf)
library(here)
library (spdep)
library (ggplot2)
library (spatialreg)
library (INLA)


load (here("data", "Data_Areal.RData") )
st_drop_geometry(dat.areal)[1:4,]
##                ID  FIPS           Area State Population Cases Income inc
## 1 alabama,autauga 01001 Autauga County    AL      49960   184  42013  37
## 2 alabama,baldwin 01003 Baldwin County    AL     171769   416  40250  24
## 3 alabama,barbour 01005 Barbour County    AL      27941   165  25101  59
## 4    alabama,bibb 01007    Bibb County    AL      21535    61  31420  28

CAR models can be fitted using the spautolm function in library spatialreg. The function takes a regression formula similar to lm, with a spatial weight matrix defined as an nb2listw object. We will examine both first-order and second-order (neighbors of 1-first order neighbors).

dat.areal_proj = st_transform(dat.areal, crs = "ESRI:102004")
nb = poly2nb (dat.areal_proj)
nb = nblag (nb, 2) #Create 2nd order and this makes nb into a list

#2nd-order has about twice as many links
nb[[1]]
## Neighbour list object:
## Number of regions: 226 
## Number of nonzero links: 1258 
## Percentage nonzero weights: 2.462996 
## Average number of links: 5.566372
nb[[2]] 
## Neighbour list object:
## Number of regions: 226 
## Number of nonzero links: 2534 
## Percentage nonzero weights: 4.961234 
## Average number of links: 11.21239
#Create adjacency matrix: 
#Style B = binary for edges
W = nb2listw(nb[[1]], style="B")
W2 = nb2listw(nb[[2]], style="B") 

## Standard linear regression
Y = dat.areal$inc
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
fit0 = lm (Y~X)

##Fit CAR model
fit1.car = spautolm (Y~X, family = "CAR", listw = W)
fit2.car = spautolm (Y~X, family = "CAR", listw = W2)

The summary from model fit gives similar outputs as with linear regression. The new parameter lamda corresponds to the CAR parameter.

summary (fit1.car)
## 
## Call: spautolm(formula = Y ~ X, listw = W, family = "CAR")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -46.2595 -13.2726  -4.3352   9.3328  72.2607 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 41.90003    3.54660 11.8142 < 2.2e-16
## X           -1.32857    0.21805 -6.0931 1.107e-09
## 
## Lambda: 0.15241 LR test value: 46.182 p-value: 1.0778e-11 
## Numerical Hessian standard error of lambda: 0.01002 
## 
## Log likelihood: -1018.587 
## ML residual variance (sigma squared): 430.96, (sigma: 20.76)
## Number of observations: 226 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model)

4.2 Disease Mapping

Instead of working with a continous rate (assumed Normal), next we will model the case count directly using Poisson regression. We will compare these to a naive analysis where we don’t borrow information around neighboring counties.

Y = dat.areal$Case
P = dat.areal$Population
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000

#First define adjaceny matrix
nb = poly2nb (dat.areal_proj)
W = nb2mat (nb, style = "B")  #B = binary

The maximum likelihood estimate for the incidence rate is given by \(Y_i/P_i\). We also calculate the standard error associated with \(log (Y_i/P_i)\), which we will compare to estimates from spatial models.

#MLE Estimates
u.mle = Y/P
se.mle = sqrt(Y)/P

We will use INLA here to perform Bayesian estimation. INLA is an efficient approach to conduct Bayesian analysis with hierarchical models when the latent variables (random effects) are Gaussian. Here we will assume the random effects are either iid (exchangeable) or follow have a spatial structure (i.e., besag or propor Leroux). Specific parametrization of these random effect distributions and priors for hyperparameters can be found here (https://inla.r-inla-download.org/r-inla.org/doc/latent/).

#Some data wrangling first
dat.areal$areal_ID = 1:nrow (dat.areal) #Create a county ID from 1 to 226

#INLA needs a specific summary of adjacency structure
nb2INLA ("adj.txt", nb) #write out a file
G <- inla.read.graph(filename = "adj.txt") #read in to get INLA's graph format

#Fit exchangeable (random effect model)
#Note: "E" here is the offset and it does not need to be log-transofrmed
fit.exch = inla (Cases~1+f(areal_ID), E= Population, family = "poisson", data = dat.areal,
                 control.compute = list(dic = TRUE, waic = TRUE, return.marginals.predictor=TRUE))

#Fit improper CAR model (besag)
fit.iCAR = inla (Cases~1+f(areal_ID, model = "besag", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#Fit proper CAR model (Leroux)
fit.pCAR = inla (Cases~1+f(areal_ID, model = "besagproper2", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#Fit BYM model (besag + idd)
#Need to create another set of county IDs because we have two random effect distributions
dat.areal$areal_ID_2 = 1:nrow (dat.areal) #Create a county ID from 1 to 226
fit.conv = inla (Cases~1+f(areal_ID, model = "besag", graph = G) + 
                   f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#We can use summary to get a lot of model information.
summary (fit.conv)
## 
## Call:
##    c("inla.core(formula = formula, family = family, contrasts = contrasts, 
##    ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
##    scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
##    ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
##    verbose, ", " lincomb = lincomb, selection = selection, control.compute 
##    = control.compute, ", " control.predictor = control.predictor, 
##    control.family = control.family, ", " control.inla = control.inla, 
##    control.fixed = control.fixed, ", " control.mode = control.mode, 
##    control.expert = control.expert, ", " control.hazard = control.hazard, 
##    control.lincomb = control.lincomb, ", " control.update = 
##    control.update, control.lp.scale = control.lp.scale, ", " 
##    control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
##    ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
##    num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
##    working.directory = working.directory, ", " silent = silent, inla.mode 
##    = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
##    .parent.frame)") 
## Time used:
##     Pre = 0.371, Running = 0.681, Post = 0.021, Total = 1.07 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.569 0.023     -5.616   -5.569     -5.523   NA   0
## 
## Random effects:
##   Name     Model
##     areal_ID Besags ICAR model
##    areal_ID_2 IID model
## 
## Model hyperparameters:
##                          mean    sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID   1.77 0.478      0.949     1.73       2.81   NA
## Precision for areal_ID_2 9.87 3.194      5.428     9.27      17.81   NA
## 
## Deviance Information Criterion (DIC) ...............: 1921.25
## Deviance Information Criterion (DIC, saturated) ....: -600.02
## Effective number of parameters .....................: 213.76
## 
## Watanabe-Akaike information criterion (WAIC) ...: 1863.59
## Effective number of parameters .................: 112.78
## 
## Marginal log-Likelihood:  -1429.88 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
#Get the fixed effects out
fixed.eff = rbind (fit.exch$summary.fixed, fit.iCAR$summary.fixed, fit.pCAR$summary.fixed, fit.conv$summary.fixed)
fixed.eff = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv"), round(fixed.eff, 2))
fixed.eff
##              Model  mean   sd X0.025quant X0.5quant X0.975quant mode kld
## (Intercept)   Exch -5.57 0.04       -5.66     -5.57       -5.49   NA   0
## (Intercept)1  iCAR -5.57 0.01       -5.59     -5.57       -5.55   NA   0
## (Intercept)2  pCAR -5.57 0.21       -5.99     -5.57       -5.15   NA   0
## (Intercept)3  Conv -5.57 0.02       -5.62     -5.57       -5.52   NA   0
#INLA reports precision = 1/variance. Let's convert them using the estimated posterior marginal distribution
#First create a custom function
prec_to_var = function (marg){ data.frame(inla.zmarginal(inla.tmarginal(function(x){(1/x)},marg), silent=TRUE))}

var.est = rbind ( prec_to_var(fit.exch$marginals.hyperpar[[1]]), 
                  prec_to_var(fit.iCAR$marginals.hyperpar[[1]]), 
                  prec_to_var(fit.pCAR$marginals.hyperpar[[1]]),
                  prec_to_var(fit.conv$marginals.hyperpar[[1]]),
                  prec_to_var(fit.conv$marginals.hyperpar[[2]]))
var.est = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv", "Conv"),
                         Type = c("Sigma2", "Tau2", "Tau2", "Tau2", "Sigma2"), round(var.est, 2))
var.est
##   Model   Type mean   sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1  Exch Sigma2 0.42 0.04       0.35      0.39     0.42      0.45       0.51
## 2  iCAR   Tau2 1.21 0.12       1.00      1.13     1.20      1.29       1.47
## 3  pCAR   Tau2 1.09 0.14       0.86      1.00     1.08      1.18       1.40
## 4  Conv   Tau2 0.61 0.18       0.36      0.48     0.58      0.70       1.05
## 5  Conv Sigma2 0.11 0.03       0.06      0.09     0.11      0.13       0.18

Let’s compared the estimated relative risks across models. We first collect estimates and their error from INLA models (scaled to per 10,000 people).

## Extract estimates from each INLA model and scale to per 10,000 population
dat.areal$rr.mle = u.mle*10000
dat.areal$se.mle = se.mle*10000

dat.areal$rr.exch = fit.exch$summary.fitted.values$mean*10000
dat.areal$se.exch = fit.exch$summary.fitted.values$sd*10000

dat.areal$rr.iCAR = fit.iCAR$summary.fitted.values$mean*10000
dat.areal$se.iCAR = fit.iCAR$summary.fitted.values$sd*10000

dat.areal$rr.conv = fit.conv$summary.fitted.values$mean*10000
dat.areal$se.conv = fit.conv$summary.fitted.values$sd*10000

We see that the point estimates are pretty similar across models and they capture similar spatial patterns.

line01 <- function(x,y,...){abline(0,1, col = 4,lwd=2);  points(x,y,col=2)}
plot(data.frame(MLE = dat.areal$rr.mle, Exchangeable = dat.areal$rr.exch, 
                   iCAR = dat.areal$rr.iCAR, Convolution = dat.areal$rr.conv), panel=line01)

plot.dat = data.frame (FIPS = dat.areal$FIPS,
                  type =rep(c("MLE", "Exchangeable", "iCAR", "Convolution"), each = nrow (dat.areal)),
                  rr = c(as.matrix(st_drop_geometry(dat.areal[,c("rr.mle", "rr.exch", "rr.iCAR", "rr.conv")]))))
plot.dat$type = factor(plot.dat$type, levels = c("MLE", "Exchangeable", "iCAR", "Convolution"))
plot.dat = merge (dat.areal, plot.dat, by = "FIPS")
ggplot () + geom_sf (data = plot.dat, aes (fill = rr)) + facet_wrap(~type)+
  scale_fill_gradient2(low = "white",high = "red", name = "Rate")

Let’s just compare the convoluation model and the naive estimates. We see that the larger differences in county-specific relative risks are observed in counties with less cases and smaller population. The difference in precision (measured by the standard error of relative estimates) also depends on population and case number.

par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$rr.mle - dat.areal$rr.conv,
       ylab = "Differnece in RR", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$rr.mle - dat.areal$rr.conv,
       ylab = "Differnece in RR", xlab = "Ln (Population)"); abline (0,0, col = "blue")

par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$se.mle - dat.areal$se.conv,
       ylab = "Differnece in SE(RR)", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$se.mle - dat.areal$se.conv,
       ylab = "Differnece in SE(RR)", xlab = "Ln (Population)"); abline (0,0, col = "blue")

Finally, let’s add household meduab income as a spatial covariate. We see that the spatial model gives a stronger negative association between income and chlamydia rates. The standard error is also much larger compared to a standard Poisson regression.

dat.areal$Income_center = (dat.areal$Income-mean (dat.areal$Income))/1000
fit = inla (Cases~Income_center+f(areal_ID, model = "besag", graph = G) + 
                   f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

summary (fit)
## 
## Call:
##    c("inla.core(formula = formula, family = family, contrasts = contrasts, 
##    ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
##    scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
##    ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
##    verbose, ", " lincomb = lincomb, selection = selection, control.compute 
##    = control.compute, ", " control.predictor = control.predictor, 
##    control.family = control.family, ", " control.inla = control.inla, 
##    control.fixed = control.fixed, ", " control.mode = control.mode, 
##    control.expert = control.expert, ", " control.hazard = control.hazard, 
##    control.lincomb = control.lincomb, ", " control.update = 
##    control.update, control.lp.scale = control.lp.scale, ", " 
##    control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
##    ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
##    num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
##    working.directory = working.directory, ", " silent = silent, inla.mode 
##    = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
##    .parent.frame)") 
## Time used:
##     Pre = 0.402, Running = 1.1, Post = 0.0892, Total = 1.59 
## Fixed effects:
##                 mean    sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept)   -5.564 0.016     -5.597   -5.564     -5.531   NA   0
## Income_center -0.034 0.005     -0.044   -0.034     -0.024   NA   0
## 
## Random effects:
##   Name     Model
##     areal_ID Besags ICAR model
##    areal_ID_2 IID model
## 
## Model hyperparameters:
##                           mean    sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID    1.48 0.225       1.02     1.46       1.92   NA
## Precision for areal_ID_2 23.45 9.370      11.69    21.94      49.54   NA
## 
## Deviance Information Criterion (DIC) ...............: 1919.62
## Deviance Information Criterion (DIC, saturated) ....: -601.65
## Effective number of parameters .....................: 211.08
## 
## Watanabe-Akaike information criterion (WAIC) ...: 1864.94
## Effective number of parameters .................: 113.08
## 
## Marginal log-Likelihood:  -1416.25 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
prec_to_var(fit$marginals.hyperpar[[1]])
##        mean        sd quant0.025 quant0.25  quant0.5 quant0.75 quant0.975
## 1 0.6938514 0.1133778  0.5252033 0.6115126 0.6742112 0.7570979  0.9648019
prec_to_var(fit$marginals.hyperpar[[2]])
##         mean         sd quant0.025  quant0.25   quant0.5  quant0.75 quant0.975
## 1 0.04837744 0.01610103 0.02108057 0.03640378 0.04725156 0.05884403 0.08283049
##Compare to other GLM
dat.areal$logPop = log(dat.areal$Population)
summary (glm (Cases~offset (logPop) + Income_center, family = "poisson", data = dat.areal))
## 
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "poisson", 
##     data = dat.areal)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -19.140   -7.805   -2.782    0.299   42.374  
## 
## Coefficients:
##                 Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)   -5.1873009  0.0043950 -1180.27   <2e-16 ***
## Income_center -0.0228930  0.0003785   -60.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 21775  on 225  degrees of freedom
## Residual deviance: 17893  on 224  degrees of freedom
## AIC: 19376
## 
## Number of Fisher Scoring iterations: 4
summary (glm (Cases~offset (logPop) + Income_center, family = "quasipoisson", data = dat.areal))
## 
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "quasipoisson", 
##     data = dat.areal)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -19.140   -7.805   -2.782    0.299   42.374  
## 
## Coefficients:
##                Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)   -5.187301   0.038599 -134.390  < 2e-16 ***
## Income_center -0.022893   0.003324   -6.888 5.66e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 77.13112)
## 
##     Null deviance: 21775  on 225  degrees of freedom
## Residual deviance: 17893  on 224  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4
#Examine random effects from the model
dat.areal$gamma_est = fit$summary.random[[1]]$mean
dat.areal$gamma_se = fit$summary.random[[1]]$sd
dat.areal$theta_est = fit$summary.random[[2]]$mean
dat.areal$theta_se = fit$summary.random[[2]]$sd
ggplot () + geom_sf (data = dat.areal, aes (fill = gamma_est))+ 
  scale_fill_gradient2(low = "white",high = "red", name = "Gamma")+ggtitle("Spatial Random Effect Estimates")

ggplot () + geom_sf (data = dat.areal, aes (fill = theta_est))+ 
  scale_fill_gradient2(low = "white",high = "red", name = "Theta")+ggtitle("Independent Random Effect Estimates")

======= Spatial-Regression.knit

SISMID Spatial Satistics

Data Lab 4 Disease Mapping with Conditional Autoregressive Models


4.1 Introduction to Conditional Autoregressive Models

We will first load the dataframe with shapefile of chlamydia incidence and income from previous lab.

library (sf)
library(here)
library (spdep)
library (ggplot2)
library (spatialreg)
library (INLA)


load (here("data", "Data_Areal.RData") )
st_drop_geometry(dat.areal)[1:4,]
##                ID  FIPS           Area State Population Cases Income inc
## 1 alabama,autauga 01001 Autauga County    AL      49960   184  42013  37
## 2 alabama,baldwin 01003 Baldwin County    AL     171769   416  40250  24
## 3 alabama,barbour 01005 Barbour County    AL      27941   165  25101  59
## 4    alabama,bibb 01007    Bibb County    AL      21535    61  31420  28

CAR models can be fitted using the spautolm function in library spatialreg. The function takes a regression formula similar to lm, with a spatial weight matrix defined as an nb2listw object. We will examine both first-order and second-order (neighbors of 1-first order neighbors).

dat.areal_proj = st_transform(dat.areal, crs = "ESRI:102004")
nb = poly2nb (dat.areal_proj)
nb = nblag (nb, 2) #Create 2nd order and this makes nb into a list

#2nd-order has about twice as many links
nb[[1]]
## Neighbour list object:
## Number of regions: 226 
## Number of nonzero links: 1258 
## Percentage nonzero weights: 2.462996 
## Average number of links: 5.566372
nb[[2]] 
## Neighbour list object:
## Number of regions: 226 
## Number of nonzero links: 2534 
## Percentage nonzero weights: 4.961234 
## Average number of links: 11.21239
#Create adjacency matrix: 
#Style B = binary for edges
W = nb2listw(nb[[1]], style="B")
W2 = nb2listw(nb[[2]], style="B") 

## Standard linear regression
Y = dat.areal$inc
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
fit0 = lm (Y~X)

##Fit CAR model
fit1.car = spautolm (Y~X, family = "CAR", listw = W)
fit2.car = spautolm (Y~X, family = "CAR", listw = W2)

The summary from model fit gives similar outputs as with linear regression. The new parameter lamda corresponds to the CAR parameter.

summary (fit1.car)
## 
## Call: spautolm(formula = Y ~ X, listw = W, family = "CAR")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -46.2595 -13.2726  -4.3352   9.3328  72.2607 
## 
## Coefficients: 
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 41.90003    3.54660 11.8142 < 2.2e-16
## X           -1.32857    0.21805 -6.0931 1.107e-09
## 
## Lambda: 0.15241 LR test value: 46.182 p-value: 1.0778e-11 
## Numerical Hessian standard error of lambda: 0.01002 
## 
## Log likelihood: -1018.587 
## ML residual variance (sigma squared): 430.96, (sigma: 20.76)
## Number of observations: 226 
## Number of parameters estimated: 4 
## AIC: NA (not available for weighted model)

4.2 Disease Mapping

Instead of working with a continous rate (assumed Normal), next we will model the case count directly using Poisson regression. We will compare these to a naive analysis where we don’t borrow information around neighboring counties.

Y = dat.areal$Case
P = dat.areal$Population
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000

#First define adjaceny matrix
nb = poly2nb (dat.areal_proj)
W = nb2mat (nb, style = "B")  #B = binary

The maximum likelihood estimate for the incidence rate is given by \(Y_i/P_i\). We also calculate the standard error associated with \(log (Y_i/P_i)\), which we will compare to estimates from spatial models.

#MLE Estimates
u.mle = Y/P
se.mle = sqrt(Y)/P

We will use INLA here to perform Bayesian estimation. INLA is an efficient approach to conduct Bayesian analysis with hierarchical models when the latent variables (random effects) are Gaussian. Here we will assume the random effects are either iid (exchangeable) or follow have a spatial structure (i.e., besag or propor Leroux). Specific parametrization of these random effect distributions and priors for hyperparameters can be found here (https://inla.r-inla-download.org/r-inla.org/doc/latent/).

#Some data wrangling first
dat.areal$areal_ID = 1:nrow (dat.areal) #Create a county ID from 1 to 226

#INLA needs a specific summary of adjacency structure
nb2INLA ("adj.txt", nb) #write out a file
G <- inla.read.graph(filename = "adj.txt") #read in to get INLA's graph format

#Fit exchangeable (random effect model)
#Note: "E" here is the offset and it does not need to be log-transofrmed
fit.exch = inla (Cases~1+f(areal_ID), E= Population, family = "poisson", data = dat.areal,
                 control.compute = list(dic = TRUE, waic = TRUE, return.marginals.predictor=TRUE))

#Fit improper CAR model (besag)
fit.iCAR = inla (Cases~1+f(areal_ID, model = "besag", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#Fit proper CAR model (Leroux)
fit.pCAR = inla (Cases~1+f(areal_ID, model = "besagproper2", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#Fit BYM model (besag + idd)
#Need to create another set of county IDs because we have two random effect distributions
dat.areal$areal_ID_2 = 1:nrow (dat.areal) #Create a county ID from 1 to 226
fit.conv = inla (Cases~1+f(areal_ID, model = "besag", graph = G) + 
                   f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

#We can use summary to get a lot of model information.
summary (fit.conv)
## 
## Call:
##    c("inla.core(formula = formula, family = family, contrasts = contrasts, 
##    ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
##    scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
##    ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
##    verbose, ", " lincomb = lincomb, selection = selection, control.compute 
##    = control.compute, ", " control.predictor = control.predictor, 
##    control.family = control.family, ", " control.inla = control.inla, 
##    control.fixed = control.fixed, ", " control.mode = control.mode, 
##    control.expert = control.expert, ", " control.hazard = control.hazard, 
##    control.lincomb = control.lincomb, ", " control.update = 
##    control.update, control.lp.scale = control.lp.scale, ", " 
##    control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
##    ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
##    num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
##    working.directory = working.directory, ", " silent = silent, inla.mode 
##    = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
##    .parent.frame)") 
## Time used:
##     Pre = 0.371, Running = 0.681, Post = 0.021, Total = 1.07 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.569 0.023     -5.616   -5.569     -5.523   NA   0
## 
## Random effects:
##   Name     Model
##     areal_ID Besags ICAR model
##    areal_ID_2 IID model
## 
## Model hyperparameters:
##                          mean    sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID   1.77 0.478      0.949     1.73       2.81   NA
## Precision for areal_ID_2 9.87 3.194      5.428     9.27      17.81   NA
## 
## Deviance Information Criterion (DIC) ...............: 1921.25
## Deviance Information Criterion (DIC, saturated) ....: -600.02
## Effective number of parameters .....................: 213.76
## 
## Watanabe-Akaike information criterion (WAIC) ...: 1863.59
## Effective number of parameters .................: 112.78
## 
## Marginal log-Likelihood:  -1429.88 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
#Get the fixed effects out
fixed.eff = rbind (fit.exch$summary.fixed, fit.iCAR$summary.fixed, fit.pCAR$summary.fixed, fit.conv$summary.fixed)
fixed.eff = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv"), round(fixed.eff, 2))
fixed.eff
##              Model  mean   sd X0.025quant X0.5quant X0.975quant mode kld
## (Intercept)   Exch -5.57 0.04       -5.66     -5.57       -5.49   NA   0
## (Intercept)1  iCAR -5.57 0.01       -5.59     -5.57       -5.55   NA   0
## (Intercept)2  pCAR -5.57 0.21       -5.99     -5.57       -5.15   NA   0
## (Intercept)3  Conv -5.57 0.02       -5.62     -5.57       -5.52   NA   0
#INLA reports precision = 1/variance. Let's convert them using the estimated posterior marginal distribution
#First create a custom function
prec_to_var = function (marg){ data.frame(inla.zmarginal(inla.tmarginal(function(x){(1/x)},marg), silent=TRUE))}

var.est = rbind ( prec_to_var(fit.exch$marginals.hyperpar[[1]]), 
                  prec_to_var(fit.iCAR$marginals.hyperpar[[1]]), 
                  prec_to_var(fit.pCAR$marginals.hyperpar[[1]]),
                  prec_to_var(fit.conv$marginals.hyperpar[[1]]),
                  prec_to_var(fit.conv$marginals.hyperpar[[2]]))
var.est = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv", "Conv"),
                         Type = c("Sigma2", "Tau2", "Tau2", "Tau2", "Sigma2"), round(var.est, 2))
var.est
##   Model   Type mean   sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1  Exch Sigma2 0.42 0.04       0.35      0.39     0.42      0.45       0.51
## 2  iCAR   Tau2 1.21 0.12       1.00      1.13     1.20      1.29       1.47
## 3  pCAR   Tau2 1.09 0.14       0.86      1.00     1.08      1.18       1.40
## 4  Conv   Tau2 0.61 0.18       0.36      0.48     0.58      0.70       1.05
## 5  Conv Sigma2 0.11 0.03       0.06      0.09     0.11      0.13       0.18

Let’s compared the estimated relative risks across models. We first collect estimates and their error from INLA models (scaled to per 10,000 people).

## Extract estimates from each INLA model and scale to per 10,000 population
dat.areal$rr.mle = u.mle*10000
dat.areal$se.mle = se.mle*10000

dat.areal$rr.exch = fit.exch$summary.fitted.values$mean*10000
dat.areal$se.exch = fit.exch$summary.fitted.values$sd*10000

dat.areal$rr.iCAR = fit.iCAR$summary.fitted.values$mean*10000
dat.areal$se.iCAR = fit.iCAR$summary.fitted.values$sd*10000

dat.areal$rr.conv = fit.conv$summary.fitted.values$mean*10000
dat.areal$se.conv = fit.conv$summary.fitted.values$sd*10000

We see that the point estimates are pretty similar across models and they capture similar spatial patterns.

line01 <- function(x,y,...){abline(0,1, col = 4,lwd=2);  points(x,y,col=2)}
plot(data.frame(MLE = dat.areal$rr.mle, Exchangeable = dat.areal$rr.exch, 
                   iCAR = dat.areal$rr.iCAR, Convolution = dat.areal$rr.conv), panel=line01)

plot.dat = data.frame (FIPS = dat.areal$FIPS,
                  type =rep(c("MLE", "Exchangeable", "iCAR", "Convolution"), each = nrow (dat.areal)),
                  rr = c(as.matrix(st_drop_geometry(dat.areal[,c("rr.mle", "rr.exch", "rr.iCAR", "rr.conv")]))))
plot.dat$type = factor(plot.dat$type, levels = c("MLE", "Exchangeable", "iCAR", "Convolution"))
plot.dat = merge (dat.areal, plot.dat, by = "FIPS")
ggplot () + geom_sf (data = plot.dat, aes (fill = rr)) + facet_wrap(~type)+
  scale_fill_gradient2(low = "white",high = "red", name = "Rate")

Let’s just compare the convoluation model and the naive estimates. We see that the larger differences in county-specific relative risks are observed in counties with less cases and smaller population. The difference in precision (measured by the standard error of relative estimates) also depends on population and case number.

par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$rr.mle - dat.areal$rr.conv,
       ylab = "Differnece in RR", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$rr.mle - dat.areal$rr.conv,
       ylab = "Differnece in RR", xlab = "Ln (Population)"); abline (0,0, col = "blue")

par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$se.mle - dat.areal$se.conv,
       ylab = "Differnece in SE(RR)", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$se.mle - dat.areal$se.conv,
       ylab = "Differnece in SE(RR)", xlab = "Ln (Population)"); abline (0,0, col = "blue")

Finally, let’s add household meduab income as a spatial covariate. We see that the spatial model gives a stronger negative association between income and chlamydia rates. The standard error is also much larger compared to a standard Poisson regression.

dat.areal$Income_center = (dat.areal$Income-mean (dat.areal$Income))/1000
fit = inla (Cases~Income_center+f(areal_ID, model = "besag", graph = G) + 
                   f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))

summary (fit)
## 
## Call:
##    c("inla.core(formula = formula, family = family, contrasts = contrasts, 
##    ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
##    scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
##    ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
##    verbose, ", " lincomb = lincomb, selection = selection, control.compute 
##    = control.compute, ", " control.predictor = control.predictor, 
##    control.family = control.family, ", " control.inla = control.inla, 
##    control.fixed = control.fixed, ", " control.mode = control.mode, 
##    control.expert = control.expert, ", " control.hazard = control.hazard, 
##    control.lincomb = control.lincomb, ", " control.update = 
##    control.update, control.lp.scale = control.lp.scale, ", " 
##    control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
##    ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
##    num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
##    working.directory = working.directory, ", " silent = silent, inla.mode 
##    = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
##    .parent.frame)") 
## Time used:
##     Pre = 0.402, Running = 1.1, Post = 0.0892, Total = 1.59 
## Fixed effects:
##                 mean    sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept)   -5.564 0.016     -5.597   -5.564     -5.531   NA   0
## Income_center -0.034 0.005     -0.044   -0.034     -0.024   NA   0
## 
## Random effects:
##   Name     Model
##     areal_ID Besags ICAR model
##    areal_ID_2 IID model
## 
## Model hyperparameters:
##                           mean    sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID    1.48 0.225       1.02     1.46       1.92   NA
## Precision for areal_ID_2 23.45 9.370      11.69    21.94      49.54   NA
## 
## Deviance Information Criterion (DIC) ...............: 1919.62
## Deviance Information Criterion (DIC, saturated) ....: -601.65
## Effective number of parameters .....................: 211.08
## 
## Watanabe-Akaike information criterion (WAIC) ...: 1864.94
## Effective number of parameters .................: 113.08
## 
## Marginal log-Likelihood:  -1416.25 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
prec_to_var(fit$marginals.hyperpar[[1]])
##        mean        sd quant0.025 quant0.25  quant0.5 quant0.75 quant0.975
## 1 0.6938514 0.1133778  0.5252033 0.6115126 0.6742112 0.7570979  0.9648019
prec_to_var(fit$marginals.hyperpar[[2]])
##         mean         sd quant0.025  quant0.25   quant0.5  quant0.75 quant0.975
## 1 0.04837744 0.01610103 0.02108057 0.03640378 0.04725156 0.05884403 0.08283049
##Compare to other GLM
dat.areal$logPop = log(dat.areal$Population)
summary (glm (Cases~offset (logPop) + Income_center, family = "poisson", data = dat.areal))
## 
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "poisson", 
##     data = dat.areal)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -19.140   -7.805   -2.782    0.299   42.374  
## 
## Coefficients:
##                 Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)   -5.1873009  0.0043950 -1180.27   <2e-16 ***
## Income_center -0.0228930  0.0003785   -60.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 21775  on 225  degrees of freedom
## Residual deviance: 17893  on 224  degrees of freedom
## AIC: 19376
## 
## Number of Fisher Scoring iterations: 4
summary (glm (Cases~offset (logPop) + Income_center, family = "quasipoisson", data = dat.areal))
## 
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "quasipoisson", 
##     data = dat.areal)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -19.140   -7.805   -2.782    0.299   42.374  
## 
## Coefficients:
##                Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)   -5.187301   0.038599 -134.390  < 2e-16 ***
## Income_center -0.022893   0.003324   -6.888 5.66e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 77.13112)
## 
##     Null deviance: 21775  on 225  degrees of freedom
## Residual deviance: 17893  on 224  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4
#Examine random effects from the model
dat.areal$gamma_est = fit$summary.random[[1]]$mean
dat.areal$gamma_se = fit$summary.random[[1]]$sd
dat.areal$theta_est = fit$summary.random[[2]]$mean
dat.areal$theta_se = fit$summary.random[[2]]$sd
ggplot () + geom_sf (data = dat.areal, aes (fill = gamma_est))+ 
  scale_fill_gradient2(low = "white",high = "red", name = "Gamma")+ggtitle("Spatial Random Effect Estimates")

ggplot () + geom_sf (data = dat.areal, aes (fill = theta_est))+ 
  scale_fill_gradient2(low = "white",high = "red", name = "Theta")+ggtitle("Independent Random Effect Estimates")

>>>>>>> 1229a265105ae69160fe8c28ca27647c2b02910c